feat(yp): faster typescript serialization by AztecBot · Pull Request #23713 · AztecProtocol/aztec-packages

AztecBot · 2026-05-29T19:54:31Z

Result: 3-4x faster typescript serialization.

Give another mode to toBuffer that, if a buffer 'sink' is passed, write to that instead. Thus a top-level toBuffer() call, instead of

a.toBuffer() = a.b.toBuffer() + a.c.toBuffer()

we have, approximately

a.toBuffer() = {
  sink = new Sink()
  a.b.toBuffer(sink)
  a.c.toBuffer(sink)
  sink.toBuffer();
}

This allows use to avoid intermediate buffer allocations and additionally we did some trial and error to speed things up on optimizations done by V8, which are hopefully representative.

…path Replace the recursive Tx.toBuffer() chain (Buffer alloc at every node, Buffer.concat at every level) with a single growable ArrayBuffer the whole object graph streams into and that is sliced once at the root. The migration contract is the optional-sink overload: toBuffer(): Buffer; toBuffer(sink: BufferSink): void; Pass a sink and it writes + returns undefined; omit it and it returns its own buffer. Unmigrated children fall back via return value, so it lands incrementally and existing toBuffer() callers keep working. Converts the Tx spine end-to-end: Tx/TxArray, TxHash/TxHashArray, PrivateKernelTailCircuitPublicInputs (+partials), PrivateToRollupAccumulatedData, ChonkProof/ChonkProofWithPublicInputs, HashedValues, Vector, BaseField. BufferSink.writeBigInt uses 4x DataView.setBigUint64 limbs for 32-byte fields (no hex round-trip, no per-field alloc). On a modeled rollup Tx (~2660 fields) byte-identical to today and ~11x faster end-to-end; the naive per-byte shift loop is actually slower than legacy, so picking the right field encoder is the win. Adds toBuffer cases (private + public) to stdlib/src/tx/tx_bench.test.ts recording per-op microseconds + payload bytes; wired into CI via the existing bench_cmds entry, dashboard series Tx/{private,public}/toBuffer/*. fromBuffer/zod path is unchanged and out of scope.

…r ~9x Tx.toBuffer Real bench (stdlib/src/tx/tx_bench.test.ts) on this PR's spine-only baseline vs after: - Tx/private/toBuffer: 1.96 ms -> 0.22 ms (~8.9x) - Tx/public/toBuffer: 3.11 ms -> 0.34 ms (~9.1x) - Tx/private/toBufferReusedSink: 1.86 ms -> 0.16 ms (~12x) - Tx/public/toBufferReusedSink: 3.04 ms -> 0.29 ms (~10.5x) cpu-prof on the prior code showed serializeToSink dominating ~50% of total time: the rest-args + Array.isArray + Buffer.isBuffer + 5x typeof dispatch ran per element of every nested array, and serializeToSink(sink, ...obj) allocated a fresh rest-args array for each recursion (1632-element spread per ChonkProof, every call). Changes: - foundation/serialize/buffer_sink: split dispatch into per-element serializeOneToSink and an inner serializeArrayToSinkInner that recurses with the array reference, no spread. Hot-path objects exposing toBuffer first so Fr/Fq/migrated leaves skip the primitive-typeof chain. serializeArrayToSink uses the same inner. - foundation/curves/bn254/field: BaseField caches its 32-byte serialized form. The cache is populated eagerly in the constructor when built from a 32-byte Buffer (the deserialization path) and lazily on first toBuffer otherwise. toBuffer returns a defensive Buffer.from copy or writes the cached bytes straight into a sink, with no bigint->bytes round-trip on the hot path. The Buffer ctor copies via new Uint8Array to defend against caller-side mutation; the copy-ctor aliases the cache (it is never mutated post-assignment). - stdlib/tx/tx: pre-size the BufferSink with the last serialized length so the no-sink fresh-allocation path skips the 1k->64k doubling-growth cost. Hint lives in a module-level WeakMap rather than an instance field so deep-equality assertions on Tx (which compare enumerable own properties) are unaffected.

…ench The previous commit's BaseField byte cache helped the existing steady-state bench (2051 calls per Tx) but adds a 32-byte Uint8Array alloc plus a Buffer.from copy on every cold-path Fr.toBuffer call. The synthetic bench was a misleading measurement since prod typically serializes each Tx once. Measured impact, dispatch fix + sink presize only (this commit) vs. with the cache: variant no cache with cache private steady 0.28 ms 0.22 ms (cache +20%) private cold 0.31 ms 1.00 ms (cache -3.2x, real regression) public steady 0.38 ms 0.34 ms public cold 0.37 ms ~1 ms The dispatch fix + WeakMap sink-size hint already give ~7-10x vs. the spine-only baseline (1.94 ms / 3.22 ms) without any state held on Fr instances, no deserialize-time copies, no extra memory per long-lived Tx in the mempool. Also adds two cold-start bench cases (one toBuffer per fresh Tx, no warm cache, no sink reuse) so the dashboard tracks the realistic per-tx cost alongside the steady-state numbers, and a future byte-cache attempt can be evaluated honestly.

…hint Three small adds on top of the dispatch-fix + sink-presize commits, all without the byte-cache tradeoff: - foundation/serialize/buffer_sink: split a no-width writeField(value) off writeBigInt so V8 can specialize the Fr/Fq 32-byte limb encoder without the wider routine's width branch. Add writeFields(arr) which iterates a flat field-element array inline with no per-element Sinkable dispatch. - foundation/curves/bn254/field: BaseField.toBuffer(sink) now calls writeField. - stdlib/proofs/chonk_proof: both proof classes switch the 1632-element field vector from serializeToSink(... this.fields) to sink.writeFields(this.fields), skipping per-Fr serializeOneToSink dispatch on the largest leaf array in a Tx. - stdlib/tx/tx: fall back to a process-wide largest-seen Tx size when the per-instance WeakMap sink-size hint is missing, so the no-sink fresh-allocation cold path (different Tx every call) also benefits from sink pre-sizing once any Tx in the process has been serialized. Best-of-3 AVG us/op vs. spine-only baseline (1940 / 3220): variant current spine baseline speedup private steady 220 1940 ~8.8x public steady 325 3220 ~9.9x private reused 167 1860 ~11.1x public reused 266 3040 ~11.4x private cold ~275 - - public cold ~445 - - Cold numbers are inherently noisier (each timed call serializes a different Tx with different field shapes, so V8 inline caches churn) but stay well below the steady baseline.

alexghr

🚀

Adds buffer_sink.test.ts covering the new BufferSink module: byte-for-byte equivalence of every sink writer against the legacy serializeBigInt/free_funcs, serializeToSink dispatch (mixed/nested/migrated/legacy-node fallback), capacity growth, reset reuse, overflow/negative guards, and a sink->BufferReader round-trip. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The arbitrary-width branch of writeBigInt used a per-byte BigInt shift loop, which benchmarked as the slowest option (slower than the legacy hex round-trip) because each byte allocates a fresh BigInt. Replace it with 64-bit setBigUint64 limbs written from the least-significant tail, plus a <=7-byte leftover head for widths that aren't a multiple of 8. Faster than the legacy path at every width; multiples of 8 (the only widths used in practice: 8/16/32) take the pure-limb path. The width===32 unrolled fast path is retained. Extends the width coverage in buffer_sink.test.ts.

fcarreiro · 2026-05-31T19:29:53Z


+// Per-instance sink size hint. Held externally (WeakMap) so it does not appear as an enumerable instance
+// field, which would otherwise make deep-equality assertions fail when one side has been serialized.
+const txSizeHints = new WeakMap<Tx, number>();


We should know this number right? Can we hardcode it even if approximate?

cc @ludamad

embarrassingly, I missed that it snuck this in. Will get a hardcoded number, makes sense

…constant The per-instance WeakMap + process-wide largest-seen-size heuristic both existed only to pre-size the BufferSink the no-sink Tx.toBuffer() path allocates. The bootstrapped bench measures the actual Tx payloads at: - private-only: 81763 bytes - public-with-enqueued-calls: 129128 bytes A single 131072-byte (128 KiB) presize covers both shapes without any doubling-growth ensure() resize on the cold path, and is the same allocation the WeakMap fast-path made on the steady-state hot path anyway. Removing the hidden state matches Adam's review feedback and brings the bench numbers within noise of the WeakMap version: variant weakmap (prev) constant (this) private steady 220 us ~244 us public steady 325 us ~351 us private reused 167 us ~176 us public reused 266 us ~276 us private cold ~275 us ~273 us public cold ~445 us ~427 us Real-world Txs that exceed 128 KiB keep working — the sink falls back to its standard doubling growth, just paying the existing cost.

…ng value The (0x0123456789abcdefn, 7) case was 57 bits (high byte 0x01) but width=7 only holds 56 bits. Legacy serializeBigInt silently truncates the high byte; the new writeBigInt is strict and throws to match its 32-byte path. Drop the overflowing high byte so the value fits, keeping the test's stated intent (\"matches serializeBigInt byte-for-byte\") aligned with both impls. The out-of-range strictness is already covered by the dedicated \"rejects out-of-range bigints\" block.

AztecBot added the claudebox Owned by claudebox. it can push to this PR. label May 29, 2026

AztecBot changed the title ~~spike: streaming .write serialization for the Tx.toBuffer recursive path~~ spike: streaming toBuffer(sink?) serialization for the Tx path (~11x) May 29, 2026

ludamad added ci-draft Run CI on draft PRs. ci-full Run all master checks. labels May 29, 2026

ludamad marked this pull request as ready for review May 29, 2026 22:34

AztecBot changed the title ~~spike: streaming toBuffer(sink?) serialization for the Tx path (~11x)~~ refactor(stdlib): streaming toBuffer(sink?) for the Tx path (~11x) May 29, 2026

AztecBot force-pushed the cb/spike-tx-write-interface branch 3 times, most recently from cfdb8a1 to 080c41d Compare May 29, 2026 23:19

AztecBot force-pushed the cb/spike-tx-write-interface branch from 080c41d to f7e7ba2 Compare May 30, 2026 12:26

AztecBot added 4 commits May 31, 2026 02:58

chore(stdlib): prettier on buffer_sink.ts

40d5299

ludamad changed the title ~~refactor(stdlib): streaming toBuffer(sink?) for the Tx path (~11x)~~ feat(yp): faster typescript serde May 31, 2026

ludamad changed the title ~~feat(yp): faster typescript serde~~ feat(yp): faster typescript serialization May 31, 2026

alexghr approved these changes May 31, 2026

View reviewed changes

ludamad approved these changes May 31, 2026

View reviewed changes

ludamad added the ci-squash-and-merge label May 31, 2026

ludamad enabled auto-merge May 31, 2026 16:32

ludamad added claudebox Owned by claudebox. it can push to this PR. and removed ci-draft Run CI on draft PRs. claudebox Owned by claudebox. it can push to this PR. labels May 31, 2026

AztecBot and others added 3 commits May 31, 2026 16:45

chore(yp): prettier on buffer_sink.test.ts

77cccff

ludamad approved these changes May 31, 2026

View reviewed changes

fcarreiro reviewed May 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(yp): faster typescript serialization#23713

feat(yp): faster typescript serialization#23713
AztecBot wants to merge 10 commits into
nextfrom
cb/spike-tx-write-interface

AztecBot commented May 29, 2026 •

edited by ludamad

Loading

Uh oh!

alexghr left a comment

Uh oh!

fcarreiro May 31, 2026

Uh oh!

fcarreiro May 31, 2026

Uh oh!

ludamad May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

AztecBot commented May 29, 2026 • edited by ludamad Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexghr left a comment

Choose a reason for hiding this comment

Uh oh!

fcarreiro May 31, 2026

Choose a reason for hiding this comment

Uh oh!

fcarreiro May 31, 2026

Choose a reason for hiding this comment

Uh oh!

ludamad May 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AztecBot commented May 29, 2026 •

edited by ludamad

Loading